In a very first step, you need to tell R which packages you want to load. The attached packages will be required for the analyses documented in this script
library(readxl)
library(corrplot)
library(lm.beta)
library(Hmisc)
library(qgraph)
library(plyr)
library(dplyr)
library(psych)
library(neurobase)
library(oro.nifti)
library(dplyr)
library(png)
library(ggiraphExtra)
library(data.table)
library(cowplot)
library(ggplot2)
library(patchwork)
# library(NMF)
library(ggbeeswarm)
#source("/Volumes/Users/nfranzme/R_Projects/R_Imaging/workbench/workbench_map_volume_to_surface#.R")
library(ggpubr)
library(lubridate)
library(reticulate)
library(svMisc)
library(NetworkToolbox)
library(readxl)
library(writexl)
library(lme4)
library(lmerTest)
library(stringr)
library(boot)
library(table1)
set the root directory to the full path to the fMRI_Day2_Graph_Theory folder
dir.root = "/Volumes/NO NAME/fMRI_Day2_Graph_Theory/"
dir.fc.200 = paste0(dir.root, "/Schaefer200_functional_connectivity/")
dir.data.sheets = paste0(dir.root, "/data_sheets/")
Step 1: load and prepare the data
The data frame contains data from the Alzheimer’s disease neuroimaging initiative (ADNI) database.
ADNI is a large multi-center study that focuses on multi-modal neuroimaging in Alzheimer’s disease patients, more information can be found here: http://adni.loni.usc.edu
Included imaging modalities in the current dataset are tau-PET, amyloid-PET and resting-state fMRI. We will subset the data to healthy controls (i.e. cognitively normal [CN] individuals without evidence for amyloid pathology [Ab.neg]), and subjects across the Alzheimer’s disease (AD) spectrum, defined as having abnormally elevated amyloid-beta (Ab) based on amyloid-PET. The AD spectrum includes individuals who are still cognitively normal (CN.Ab.pos, i.e. preclinical AD), subjects who show mild cognitive impairment (MCI.Ab.pos, i.e. prodromal AD), and patients with AD dementia (Dementia.Ab.pos). An illustration of the AD cascade is shown in the Figure below.
For this course we will work primarily with cross-sectional data, stored in the ADNI.data.bl data frame. However, some individuals also have longitudinal tau-PET, you could have a look at those data to if there is time left.
# read in ADNI data
ADNI.data <- readxl::read_xlsx(paste0(dir.data.sheets, "/ADNI_fMRI_PET.xlsx"))
# we'll exclude some people who have corrupted fMRI data
ADNI.data <- subset(ADNI.data, ID %nin% c("sub-4431", "sub-2133", "sub-6559"))
# merge diagnosis and amyloid status
ADNI.data$DX.Ab <- paste0(ADNI.data$DX, ".", ADNI.data$amyloid.global.SUVR.status)
ADNI.data$DX.Ab <- factor(ADNI.data$DX.Ab,
levels = c("CN.Ab.neg",
"CN.Ab.pos",
"MCI.Ab.pos",
"Dementia.Ab.pos"))
# subset to controls and subjects across the AD spectrum
# (CN = cognitively normal,
# MCI = Mild Cognitive Impairment,
# Dementia = Alzheimer's disease dementia)
ADNI.data <- subset(ADNI.data,
DX.Ab %in% c("CN.Ab.neg",
"CN.Ab.pos",
"MCI.Ab.pos",
"Dementia.Ab.pos"))
# locate functional connectivity matrices
ADNI.data$FC.mat.200 <- str_replace(ADNI.data$FC.mat.200,
"/Network/Cluster/ADNI/functional_connectivity/Schaefer_200/",
paste0(dir.root, "/Schaefer200_functional_connectivity/"))
# select baseline data
ADNI.data.bl = subset(ADNI.data, tau.pet.fu.yrs == 0)
First, you should get an idea of the type of data we’re working with. To this end we will check some basic distributions of biomarker and cognitive data for the current patient cohort.
We can first create an overview table of demographics, as well as amyloid-PET and tau-PET load, to
see how biomarkers are distributed across diagnostic groups.
We’ll look at the following variables (Variable names on the left, explanation on the right)
age = age
sex = sex
centiloid = global amyloid-PET level (a typical “cut-off” for abnormality is 20)
tau.global.SUVR = global tau-PET level (a typical “cut-off” for abnormality is 1.3)
MMSE = Mini Mental State Exam, i.e. a global cognitive screening instrument, where lower values indicate stronger cognitive impairment
ADNI_MEM = A memory composite score across several memory tests. Lower values indicate stronger impairment
table1(~ age + factor(sex) + centiloid + tau.global.SUVR + MMSE + ADNI_MEM | DX.Ab, data = ADNI.data.bl)
| CN.Ab.neg (N=192) |
CN.Ab.pos (N=67) |
MCI.Ab.pos (N=37) |
Dementia.Ab.pos (N=19) |
Overall (N=315) |
|
|---|---|---|---|---|---|
| age | |||||
| Mean (SD) | 72.2 (6.96) | 74.8 (5.96) | 75.9 (6.75) | 80.8 (8.09) | 73.7 (7.15) |
| Median [Min, Max] | 71.0 [56.0, 92.0] | 75.0 [65.0, 90.0] | 76.0 [61.0, 90.0] | 83.0 [62.0, 94.0] | 73.0 [56.0, 94.0] |
| factor(sex) | |||||
| F | 118 (61.5%) | 41 (61.2%) | 19 (51.4%) | 8 (42.1%) | 186 (59.0%) |
| M | 74 (38.5%) | 26 (38.8%) | 18 (48.6%) | 11 (57.9%) | 129 (41.0%) |
| centiloid | |||||
| Mean (SD) | -5.58 (12.8) | 65.6 (34.7) | 76.4 (31.7) | 93.0 (40.7) | 25.1 (45.6) |
| Median [Min, Max] | -7.12 [-34.3, 21.9] | 58.3 [23.2, 141] | 74.6 [22.8, 142] | 90.6 [44.2, 198] | 6.74 [-34.3, 198] |
| tau.global.SUVR | |||||
| Mean (SD) | 1.06 (0.0741) | 1.10 (0.0999) | 1.21 (0.222) | 1.36 (0.404) | 1.11 (0.164) |
| Median [Min, Max] | 1.06 [0.865, 1.45] | 1.10 [0.904, 1.62] | 1.15 [0.908, 1.83] | 1.23 [0.971, 2.37] | 1.08 [0.865, 2.37] |
| MMSE | |||||
| Mean (SD) | 29.3 (0.856) | 28.7 (1.74) | 26.7 (2.45) | 23.0 (4.73) | 27.7 (3.13) |
| Median [Min, Max] | 30.0 [27.0, 30.0] | 29.5 [24.0, 30.0] | 27.0 [22.0, 30.0] | 23.5 [14.0, 29.0] | 29.0 [14.0, 30.0] |
| Missing | 156 (81.2%) | 49 (73.1%) | 16 (43.2%) | 7 (36.8%) | 228 (72.4%) |
| ADNI_MEM | |||||
| Mean (SD) | 1.05 (0.566) | 0.924 (0.527) | 0.143 (0.584) | -0.685 (0.711) | 0.812 (0.740) |
| Median [Min, Max] | 0.991 [-0.229, 3.14] | 0.874 [-0.119, 1.94] | 0.0840 [-0.964, 1.69] | -0.509 [-2.39, 0.603] | 0.836 [-2.39, 3.14] |
| Missing | 1 (0.5%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (0.3%) |
As you can see, amyloid (centiloid) and tau levels (tau.global.SUVR)
increase (i.e. become more abnormal) across more severe disease levels.
Also, MMSE and ADNI_MEM values decrease, indicating stronger cognitive impairment.
Next, we should visualize distributions of amyloid and tau biomarkers,
to get an idea of how amyloid and tau relate to increasing disease severity.
We’ll use ggplot2 for plotting. Some useful ggplot2 tutorials can be found here:
https://ggplot2.tidyverse.org
http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html
http://r-statistics.co/ggplot2-Tutorial-With-R.html
Let’s first create a boxplot/beeswarm plot to illustrate the amyloid distribution across different diagnostic groups. The variables we’ll look at are centiloid (i.e. amyloid-PET) and DX.Ab (i.e. diagnostic groups)
# plot group differences
ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = centiloid)) +
geom_beeswarm() +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
theme_minimal() +
geom_hline(yintercept = 20, linetype = 2)
As you can see, amyloid levels increase across the AD spectrum. The dashed line indicates where amyloid becomes abnormal.
To statistically test group differences in amyloid, we can run an ANCOVA. It’s quite common standard to control for some covariates in clinical research, such as age and sex (and often also education)
So, lets quantify the group differences using an ANCOVA with a post-hoc Tukey Test. The ANCOVA tests whether there is an overall group-difference between diagnostic groups. The Post-Hoc Tukey test assesses the difference between single diagnostic groups. ANCOVAs can be run using the aov command and requires a data frame as input (here it is ADNI.data.bl), as well as an equation that describes the relationship between dependent and independent variables.
Here, the equation is centiloid ~ DX.Ab + age + sex, which means that we test group differences in centiloid between DX.Ab groups, while controlling for age and sex.
# test group differences (ANCOVA), controlling for age and sex
## run ANCOVA
tmp.aov <- aov(data = ADNI.data.bl,
centiloid ~ DX.Ab + age + sex);
## summarize output
summary(tmp.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DX.Ab 3 475451 158484 280.349 <2e-16 ***
## age 1 1435 1435 2.538 0.112
## sex 1 1078 1078 1.908 0.168
## Residuals 309 174681 565
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## run post-hoc Tukey test to determine group differences
TukeyHSD(tmp.aov, which = "DX.Ab")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = centiloid ~ DX.Ab + age + sex, data = ADNI.data.bl)
##
## $DX.Ab
## diff lwr upr p adj
## CN.Ab.pos-CN.Ab.neg 71.16379 62.4493273 79.87825 0.0000000
## MCI.Ab.pos-CN.Ab.neg 81.99277 70.9660878 93.01945 0.0000000
## Dementia.Ab.pos-CN.Ab.neg 98.54186 83.7714568 113.31227 0.0000000
## MCI.Ab.pos-CN.Ab.pos 10.82898 -1.7503318 23.40830 0.1191229
## Dementia.Ab.pos-CN.Ab.pos 27.37807 11.4151082 43.34104 0.0000770
## Dementia.Ab.pos-MCI.Ab.pos 16.54909 -0.7847542 33.88294 0.0673472
Let’s do the same for tau-PET, so lets first create a boxplot/beeswarm plot to illustrate the tau distribution across diagnostic groups
# plot group differences
ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.global.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2)
As you can see, tau levels also increase across the AD spectrum. However, people usually start surpassing the threshold at later stages than for amyloid. This is the case because amyloid precedes symptom onset by decades, while tau pathology is much closer to symptom onset. In fact, tau pathology is the key driver of neurodegeneration and cognitive decline, so you typically see abnormal tau first when people start to show symptoms.
To statistically test these group differences, we can again run an ANCOVA with a post-hoc Tukey Test
# test group differences (ANCOVA), controlling for age and sex
## run ANCOVA
tmp.aov <- aov(data = ADNI.data.bl,
tau.global.SUVR ~ DX.Ab + age + sex);
## summarize output
summary(tmp.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DX.Ab 3 1.998 0.6660 34.412 < 2e-16 ***
## age 1 0.350 0.3503 18.101 2.78e-05 ***
## sex 1 0.084 0.0841 4.345 0.0379 *
## Residuals 309 5.981 0.0194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## run post-hoc Tukey test to determine group differences
TukeyHSD(tmp.aov, which = "DX.Ab")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = tau.global.SUVR ~ DX.Ab + age + sex, data = ADNI.data.bl)
##
## $DX.Ab
## diff lwr upr p adj
## CN.Ab.pos-CN.Ab.neg 0.03695129 -0.01403989 0.08794248 0.2423433
## MCI.Ab.pos-CN.Ab.neg 0.15096161 0.08644085 0.21548237 0.0000000
## Dementia.Ab.pos-CN.Ab.neg 0.29530003 0.20887351 0.38172655 0.0000000
## MCI.Ab.pos-CN.Ab.pos 0.11401032 0.04040458 0.18761605 0.0004571
## Dementia.Ab.pos-CN.Ab.pos 0.25834874 0.16494414 0.35175334 0.0000000
## Dementia.Ab.pos-MCI.Ab.pos 0.14433842 0.04291236 0.24576449 0.0015846
One thing to keep in mind is that the above described analyses on tau-PET are based on “global” tau levels. However, tau does not - in contrast to amyloid - accumulate globally throughout the brain, but rather follows a consecutive spreading pattern that typically starts in the inferior temporal lobe. This “spreading pattern” of tau pathology has been already described in the nineties by Braak & Braak.
The surface rendering shows the six Braak-stages. Abnormal tau pathology is typically found first in Stage 1 (Entorhinal cortex) and then spreads to the hippocampus (Stage 2), the inferior temporal and limbic cortex (Stages 3&4) and finally to the association and primary somatosensory cortex (Stages 5&6). The plot on the left illustrates the likelihood of tau abnormality within each Braak stage across a group of AD patients, showing that early Braak stages typically show earlier tau abnormality than later Braak stages. Note that we usually discard the hippocampus when looking at tau-PET, because the tau-PET tracer shows quite significant off-target binding in the choroid plexus which is right next to the hippocampus. Thus, tau-PET signal in the hippocampus is quite confounded, at least for first-generation tau-PET tracers like flortaucipir.
Next, lets look at the distribution of tau pathology in the separate Braak stage ROIs. Tau-PET within these Braak-stage ROIs is labeled as tau.braak1.SUVR, tau.braak3.SUVR, tau.braak4.SUVR, etc.
# plot group differences
# note that you can store plots in the workspace (e.g. p1 below) and later combine them using the patchwork package
p1 <- ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.braak1.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2) +
ggtitle("Braak 1")
p2 <- ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.braak3.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2) +
ggtitle("Braak 3")
p3 <- ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.braak4.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2) +
ggtitle("Braak 4")
p4 <- ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.braak5.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2) +
ggtitle("Braak 5")
p5 <- ggplot(data = ADNI.data.bl,
aes(
x = DX.Ab,
y = tau.braak6.SUVR)) +
geom_boxplot(alpha = 0.1, notch = T, width = 0.5) +
geom_beeswarm() +
theme_minimal() +
geom_hline(yintercept = 1.3, linetype = 2) +
ggtitle("Braak 6")
# you can combine different plots using the patchwork package
# a "/" will add a new plot below, a "+" will add a plot next to the first one
p1 / p2 / p3 / p4 / p5
You can see that earlier Braak stage ROIs show abnormal signal more early during the disease course than later Braak stage ROIs
Our current understanding of Alzheimer’s disease suggests that amyloid-beta deposition is the first pathological event to happen, that triggers downstream tau accumulation, neurodegeneration and cognitive decline. Thus, tau and amyloid should be correlated. Let’s have a look:
ggplot(ADNI.data.bl,
aes(
x = centiloid,
y = tau.global.SUVR
)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
This shows that higher amyloid is clearly associated with higher tau pathology. We can quantify the association using correlation or linear regression controlling for age and sex.
To assess the correlation, we use the rcorr command from the Hmisc package.
The input arguments are just two numeric vectors of the same length.
# correlation
rcorr(ADNI.data.bl$centiloid,
ADNI.data.bl$tau.global.SUVR)
## x y
## x 1.00 0.43
## y 0.43 1.00
##
## n= 315
##
##
## P
## x y
## x 0
## y 0
To run a linear regression, we use the lm command. This works very similar to the aov command shown above for determining group differenecs. All we need is a data frame that contains our data (ADNI.data.bl), as well as an equation that describes the relationship between our independent variable tau.global.SUVR and our dependent variable centiloid, while controlling for age and sex.
# linear regression controlling for age and sex
tmp.lm <- lm(data = ADNI.data.bl,
tau.global.SUVR ~ centiloid + age + sex); summary(tmp.lm)
##
## Call:
## lm(formula = tau.global.SUVR ~ centiloid + age + sex, data = ADNI.data.bl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28412 -0.06856 -0.01542 0.04292 1.14154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3685946 0.0874300 15.654 < 2e-16 ***
## centiloid 0.0017479 0.0001875 9.320 < 2e-16 ***
## age -0.0040437 0.0012088 -3.345 0.000923 ***
## sexM -0.0198439 0.0168082 -1.181 0.238662
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.145 on 311 degrees of freedom
## Multiple R-squared: 0.2233, Adjusted R-squared: 0.2158
## F-statistic: 29.8 on 3 and 311 DF, p-value: < 2.2e-16
Now that you know how to run group comparisons (i.e. ANCOVAs) and correlations/regression (linear models), try to check whether amyloid and tau are associated with cognition (e.g. MMSE and ADNI_MEM). How do amyloid and tau relate to cognitive performance? Try to create figures and statistics showing differences in cognitive performance between diagnostic groups, and associations between amyloid, tau and cognition.
To this end, you can create a new R-script, just click on the top left File > New File > R-script as shown below. A new script will open up, where you can generate and run code.
Not that we’re familiar with the basics of AD biomarkers and R, lets go have a look at fMRI parameters (since this is actually a course on connectomics). I have prepared functional connectivity data using a 200 ROI cortical atlas (see https://pubmed.ncbi.nlm.nih.gov/28981612/), that we commonly use for fMRI analyses. A surface rendering of the atlas can be found below, including the seven networks (DMN = Default-Mode, DAN = Dorsal Attention, VAN = Ventral Attention, FPCN = Fronto-parietal control, Visual, Motor, Limbic). If you want to get some methodological details on how we processed the data, please refer to our previous paper here: https://www.nature.com/articles/s41467-019-14159-1.
To compute functional connectivity, we have extracted BOLD timeseries for each ROI, which were then correlated between ROIs.
Atlas based connectivity is usually stored in matrix format also referred to as an adjacency matrix. For 200 ROIs this means that the resulting adjacency matrix is 200x200 elements large, where each element corresponds to the connectivity-metric between two ROIs (e.g. Fisher-z transformed Pearson correlation). All connectivity matrices are stored in the “Schaefer200_functional_connectivity” folder on the hard drive in .csv format. The connectivity matrix are 200 x 200 where each row and column corresponds to an ROI (see below). The diagnonal contains autocorrelations and is thus set to infinite values.
This path to the connectivity folder is already stored in your R worksapce as dir.fc.200. The paths to the matrices of each participant are stored in the ADNI.data.bl data frame in the variable FC.mat.200. You can access this variable by navigating with the $ sign, e.g. ADNI.data.bl$FC.mat.200. Lets load an example connectivity matrix from one of the ADNI participants to see how this looks like.
# import and format connectivity matrix using the read.csv command
current.adj.matrix = read.csv(
paste0(dir.root,
"Schaefer200_functional_connectivity/Schaefer_200-ADNIsub-4187_ses-101_task-rest_bold_mcf_covremoved_detrended_filtered_MNI_8mm_smoothed.csv"))
When checking the dimension of this matrix, we realize that there are in fact 201 columns and 200 rows. The first column contains the rownames, so we need to remove it in order to get a symmetrical 200x200 matrix.
# check dimensions
dim(current.adj.matrix)
## [1] 200 201
# remove first column
current.adj.matrix.cleaned <- current.adj.matrix[,-1]
# clean diagnonal (i.e. remove infinite values with zeros)
diag(current.adj.matrix.cleaned) = 0
# transform to matrix format
current.adj.matrix.cleaned.matrix <- as.matrix(current.adj.matrix.cleaned)
# check dimensions
dim(current.adj.matrix.cleaned)
## [1] 200 200
Now the connectivity matrix has 200 rows and 200 columns. Lets have a look at the overall connectivity pattern. You can plot connectivity matrices using the “corrplot” package.
# plot correlation matrix
corrplot::corrplot(current.adj.matrix.cleaned.matrix,
diag = FALSE,
tl.pos = "n",
tl.cex = 0.5,
method = "color",
is.corr = FALSE)
You can see that the connectivity matrix is symmetrical, with positive correlations and negative correlations. Positive correlations (blue colors) mean that two brain regions show correlated BOLD signal, suggesting that they’re functionally connected. Negative correlations indicate anti-correlations, where one region is up-regulated and another one is down-regulated at the same time.
In order to perform graph theoretical analyses, we usually perform some additional preprocessing steps, e.g. we omit negative connections or perform thresholding of the matrix.
I’ve attached a function that can perform some basic thresholding operations for you. The function below can be called from the command line, and needs several input arguments:
adj_matrix = a symmetrical adjacency matrix,
retain_negatives = a boolean statement (TRUE/FALSE) indicating whether or not negative values should be retained,
density = a percentage of connections that should be retained. A density of 1 indicates that 100% of connections should be retained, a density of 0.3 means that only 30% of the strongest connections will be retained,
replace_with_NAs = a boolean statement (TRUE/FALSE) indicating whether values that are eliminated from the matrix should be replaced with an NA (not assigned). Otherwise, values are replaced with a zero.
If you’re interested in how to write functions in R, you can refer to one of the following tutorials: https://swcarpentry.github.io/r-novice-inflammation/02-func-R/, https://r4ds.had.co.nz/functions.html
# define matrix thresholding function
adj_mat_density_thresh=function(adj_matrix, retain_negatives, density, replace_with_NAs){
# exclude negative values
if (retain_negatives==F){adj_matrix[adj_matrix<0]=0}
# determine threshold
threshold=quantile(abs(adj_matrix), 1-density)
# apply threshold
if (replace_with_NAs==F){adj_matrix[abs(adj_matrix)<threshold]=0}
if (replace_with_NAs==T){adj_matrix[abs(adj_matrix)<threshold]=NA}
return(adj_matrix)
}
Next, lets threshold the matrix that we’ve imported above and eliminate all the negative values (i.e. retain_negatives = F), while keeping all positive values (i.e. density = 1). Everything else should be replaced with a zero.
# call the function and store the output in a new variable
current.adj.matrix.cleaned.matrix.thr1 <- adj_mat_density_thresh(adj_matrix = current.adj.matrix.cleaned.matrix,
retain_negatives = F,
density = 1,
replace_with_NAs = F)
Let’s see how the new matrix looks like:
# plot correlation matrix
corrplot::corrplot(current.adj.matrix.cleaned.matrix.thr1,
diag = FALSE,
tl.pos = "n",
tl.cex = 0.5,
method = "color",
is.corr = FALSE)
You can see that all the negative red values have been eliminated, while the positive values have been kept.
Next, lets see what happens if we further threshold the matrix. As an example, we want to retain only 10% of the strongest positive connections (i.e. a density of 0.1). Thresholding is often performed to remove “noisy” connections.
# call the function and store the output in a new variable
current.adj.matrix.cleaned.matrix.thr0p1 <- adj_mat_density_thresh(adj_matrix = current.adj.matrix.cleaned.matrix,
retain_negatives = F,
density = 0.1,
replace_with_NAs = F)
Let’s see how the thresholded matrix looks like:
# plot correlation matrix
corrplot::corrplot(current.adj.matrix.cleaned.matrix.thr0p1,
diag = FALSE,
tl.pos = "n",
tl.cex = 0.5,
method = "color",
is.corr = FALSE)
You can see that the matrix looks quite sparse now and only the “backbone” of very strong connections is retained. Sometimes thresholding is done to eliminate “noisy” and “weak” connections. However, these weak connections have actually been shown to not only consist of noise, but to encode some important inter-individual differences, as shown by papers on “connectome fingerprinting” (see work by Emily Finn https://www.nature.com/articles/nn.4135).
While the matrix above is relatively sparse, it still includes information about the “strength” of a connection, which we refer to as a weighted matrix Some researchers, however, use binary matrices, which just encodes the presence or absence of a connection. This is often done for structural connectivity matrices. Binarization can be done quite easily using the NetworkToolbox. So lets binarize the matrix in which we’ve retained only 10% of the strongest connections
current.adj.matrix.cleaned.matrix.thr0p1.bin <-
NetworkToolbox::binarize(current.adj.matrix.cleaned.matrix.thr0p1)
# plot correlation matrix
corrplot::corrplot(current.adj.matrix.cleaned.matrix.thr0p1.bin,
diag = FALSE,
tl.pos = 'n',
tl.cex = 0.5,
method = "color",
is.corr = FALSE)
Now, all information about the strength of connections is lost, and we’ve retained the 10% of strongest connections. Next, we will look at some graph theoretical parameters (which can be computed on all sorts of graphs) to study complex network function. Keep in mind that the the way you process the graphs can drastically change its’ appearance. So thresholding and binarizing your graph should always be motivated by theoretical considerations.
So lets use the connectivity matrix to compute some basic graph theoretical parameters. Graph theory is a mathematical approach to study networks that are defined by nodes that are connected via edges (see below). In our case, nodes are Regions of interest (ROIs) of the 200-ROI brain atlas, and the edges are the connectivity strength between the different nodes/ROIs. There are different types of graphs, including binary graphs, weighted graphs, and directed graphs that also encode information about the directionality of information flow. In fMRI, we usually work with weighted undirected graphs.
In fact, we can already render our connectivity matrix as a graph using the qplot command qgraph. To this end, we also read in the network affiliation or community vector, which tells us to which network a given ROI belongs to (e.g. DMN, DAN, Limbic, Motor etc.)
# load community vector
Community <- read.table(
paste0(dir.root,
"Atlas/Schaefer2018_200Parcels_7Networks_order_numeric.txt"))
Community$names <- mapvalues(Community$V1,
from=c(1,2,3,4,5,6,7),
to=c("Visual",
"Motor",
"DAN",
"VAN",
"Limbic",
"FPCN",
"DMN"))
# create plot
qgraph(current.adj.matrix.cleaned.matrix.thr0p1,
groups = Community$names,
layout = "spring",
labels=F,
color=c("forestgreen" ,
"indianred2" ,
"darkgoldenrod1" ,
"lemonchiffon1" ,
"skyblue3" ,
"mediumorchid1" ,
"magenta4"),
posCol=c("gray68", "gray49"))
Now that you know how to render graphs, you can check how applying different thresholds changes the appearence of your graph.
So why do we need graph theory at all?
The advantage of graph theory is that it allows to derive more complex parameters that describe network topology, such as measures of modularity, segregation and integration. A great overview of different graph theoretical measures can be found in the landmark paper by Rubinov & Sporns from 2010 (https://pubmed.ncbi.nlm.nih.gov/19819337/). The paper is, however, not open access, so I’ve stored a copy in the /fMRI_Day2_Graph_Theory/Literature/ folder for you.
How do we compute graph theoretical parameters?
For computing graph theoretical parameters, we can use various different toolboxes and programs. The most famous one is the brain connectivity toolbox in matlab, which was developed by Olaf Sporns and colleagues (see https://sites.google.com/site/bctnet/). However, matlab is quite old-fashioned and not open source, so people (including me) have moved more and more away from using it. There are many newer toolboxes for python (e.g. https://pypi.org/project/bctpy/) and R, which can basically do the same things. Since we’re working in R, we will use the “NetworkToolbox”,which comes with some easy to use commands (see [<https://cran.r-project.org/web/packages/NetworkToolbox/NetworkToolbox.pdf>](https://cran.r-project.org/web/packages/NetworkToolbox/NetworkToolbox.pdf).Alternatively))
Or alternatively, you can code the functions yourself. This sounds complicated, but really isn’t once you get a bit more skilled in coding.
Which parameters do we cover?
We’ll have a look at some basic parameters of network segregation and network integration. Specifically, we’ll look at clustering, degree, within-network connectivity, between-network connectivity (community strength), path-length, small-worldness and network segregation.
Let’s start with clustering. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. This means, if a node is connected to two neighbours, and if these neighbours are also interconnected, they form a cluster (triangle), as illustrated in the example below. The clustering coefficient just counts the number of triangles that can be found in a given community of nodes. Clustering is most often referred to as a measure of segregation, indicating how strong local communities are interconnected (e.g. whether your friends are also friends among each other)
So lets compute the clustering coefficient on our connectivity matrix. We can compute a “Global” clustering coefficient for the entire network, as well as a clustering coefficient for each single node in the network. In fact, the global clustering coefficient is just the average clustering across all network nodes (so 200 in our case). To compute the clustering coefficient, we use the clustcoeff command from the NetworkToolbox. This command requires two inputs, an adjacency matrix and a TRUE/FALSE statement on whether the matrix is weighted or binary. You can find information about this function by calling help(“clustcoeff”) from the console. Clustering is usually computed on networks with “positive” connections only, hence we need to use a connectivity matrix from which we have eliminated the negative connections called current.adj.matrix.cleaned.matrix.thr1 (see chapter 2.1. Removing negative connections). We can compute the clustering coefficient for networks thresholded at a density of 1 and 0.1 to see how the clustering coefficient changes across thresholds.
# compute clustering
tmp.clustering.weighted.thr1 <- clustcoeff(current.adj.matrix.cleaned.matrix.thr1, weighted = T)
tmp.clustering.weighted.thr0p1 <- clustcoeff(current.adj.matrix.cleaned.matrix.thr0p1, weighted = T)
The output contains two different measures, the global clustering coefficient CC, and the local clustering coefficient for each ROI called CCi. You can choose which one to look at by navigating with the $ sign.
# global clustering
tmp.clustering.weighted.thr1$CC
## [1] 0.189345
tmp.clustering.weighted.thr0p1$CC
## [1] 0.448535
# ROI-wise clustering at a density threshold of 1
tmp.clustering.weighted.thr1$CCi
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
## 0.178 0.199 0.146 0.278 0.314 0.161 0.323 0.122 0.283 0.221 0.175 0.152 0.247
## X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26
## 0.220 0.188 0.170 0.186 0.226 0.260 0.263 0.227 0.223 0.190 0.197 0.171 0.208
## X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39
## 0.172 0.197 0.180 0.177 0.173 0.145 0.130 0.246 0.181 0.140 0.130 0.163 0.183
## X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52
## 0.157 0.182 0.153 0.152 0.165 0.270 0.191 0.175 0.180 0.260 0.262 0.134 0.248
## X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65
## 0.153 0.237 0.139 0.188 0.110 0.112 0.146 0.083 0.160 0.150 0.158 0.162 0.196
## X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78
## 0.202 0.253 0.137 0.196 0.177 0.144 0.117 0.182 0.094 0.125 0.153 0.212 0.164
## X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91
## 0.216 0.183 0.231 0.288 0.117 0.188 0.136 0.203 0.139 0.236 0.256 0.160 0.275
## X92 X93 X94 X95 X96 X97 X98 X99 X100 X101 X102 X103 X104
## 0.201 0.255 0.252 0.196 0.181 0.231 0.186 0.279 0.106 0.134 0.118 0.253 0.215
## X105 X106 X107 X108 X109 X110 X111 X112 X113 X114 X115 X116 X117
## 0.150 0.308 0.160 0.378 0.199 0.188 0.167 0.291 0.170 0.242 0.189 0.206 0.187
## X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128 X129 X130
## 0.225 0.202 0.222 0.239 0.268 0.236 0.203 0.257 0.248 0.198 0.198 0.180 0.166
## X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143
## 0.204 0.152 0.178 0.178 0.133 0.145 0.240 0.204 0.216 0.182 0.185 0.146 0.132
## X144 X145 X146 X147 X148 X149 X150 X151 X152 X153 X154 X155 X156
## 0.213 0.175 0.143 0.200 0.163 0.163 0.296 0.161 0.199 0.189 0.186 0.231 0.241
## X157 X158 X159 X160 X161 X162 X163 X164 X165 X166 X167 X168 X169
## 0.174 0.268 0.131 0.138 0.180 0.115 0.135 0.109 0.175 0.130 0.157 0.156 0.122
## X170 X171 X172 X173 X174 X175 X176 X177 X178 X179 X180 X181 X182
## 0.231 0.188 0.096 0.138 0.157 0.176 0.194 0.144 0.126 0.181 0.129 0.180 0.154
## X183 X184 X185 X186 X187 X188 X189 X190 X191 X192 X193 X194 X195
## 0.186 0.266 0.090 0.191 0.136 0.205 0.175 0.159 0.232 0.245 0.117 0.292 0.265
## X196 X197 X198 X199 X200
## 0.176 0.213 0.201 0.230 0.215
# distribution of ROI-wise clustering at a density threshold of 1
hist(tmp.clustering.weighted.thr1$CCi)
# plot the association of clustering coefficients across different thresholds
plot(tmp.clustering.weighted.thr1$CCi, tmp.clustering.weighted.thr0p1$CCi)
# quantify the association of clustering coefficients across different thresholds
rcorr(tmp.clustering.weighted.thr1$CCi, tmp.clustering.weighted.thr0p1$CCi)
## x y
## x 1.00 0.43
## y 0.43 1.00
##
## n= 200
##
##
## P
## x y
## x 0
## y 0
You can see that clustering is much higher in the network that was thresholded more restrictively. Any idea/hypothesis why this could be the case?
The degree of a node quantifies the number or strength of connections that a given node has to the rest of the network (see figure below)
The degree can also be computed relatively simple as the row or column Means of a adjacency matrix, which just quantifies the strength of connections that a given node has to the remaining 199 nodes.
# compute degree
tmp.degree.weighted.thr1 <- colMeans(current.adj.matrix.cleaned.matrix.thr1, na.rm = T)
hist(tmp.degree.weighted.thr1)
tmp.degree.weighted.thr0p1 <- colMeans(current.adj.matrix.cleaned.matrix.thr0p1, na.rm = T)
hist(tmp.degree.weighted.thr0p1)
# correlation of clustering coefficients across different thresholds
plot(tmp.degree.weighted.thr1,tmp.degree.weighted.thr0p1)
rcorr(tmp.degree.weighted.thr1,tmp.degree.weighted.thr0p1)
## x y
## x 1.00 0.71
## y 0.71 1.00
##
## n= 200
##
##
## P
## x y
## x 0
## y 0
One of the most simple measures that we can determine in connectomics is a parameter of within-network connectivity. For example, we can calculate the mean connectivity among nodes of the default mode network. To this end, we use the participation command of the network toolbox. Again, this command requires a matrix A as an input, as well as a community vector that assigns the different nodes to networks. Thus, we need to know first to which network a given node belongs to. To this end, we import the community vector for the Schaefer200 atlas.
# load community vector
Community <- read.table(paste0(dir.root, "Atlas/Schaefer2018_200Parcels_7Networks_order_numeric.txt"))
Community$names <- mapvalues(Community$V1,
from=c(1,2,3,4,5,6,7),
to=c("Visual", "Motor", "DAN", "VAN", "Limbic", "FPCN", "DMN"))
# check the number of ROIs assigned to each node
table(Community$names)
##
## DAN DMN FPCN Limbic Motor VAN Visual
## 26 46 30 12 35 22 29
Next, we compute the within network connectivity using the participation command
tmp.particpation.weighted.thr1 <-
participation(A = current.adj.matrix.cleaned.matrix.thr1,
comm = Community$names)
# now we get a measure of how strong a given node is connected to other nodes of the same network
print(tmp.particpation.weighted.thr1)
## $overall
## X1 X2 X3 X4 X5 X6 X7 X8
## 0.4820195 0.6003630 0.2465394 0.6724726 0.7123501 0.3952854 0.7214967 0.2092112
## X9 X10 X11 X12 X13 X14 X15 X16
## 0.7035690 0.5912497 0.2983282 0.3861684 0.6498905 0.4875014 0.2268144 0.2469066
## X17 X18 X19 X20 X21 X22 X23 X24
## 0.3098999 0.3229858 0.3393288 0.3871954 0.3213868 0.3304328 0.3113349 0.3325983
## X25 X26 X27 X28 X29 X30 X31 X32
## 0.2217112 0.3494110 0.3076680 0.2590126 0.2827535 0.3486017 0.4062768 0.1949049
## X33 X34 X35 X36 X37 X38 X39 X40
## 0.2346423 0.2695550 0.2477828 0.1955700 0.2303987 0.1997892 0.3255592 0.2118937
## X41 X42 X43 X44 X45 X46 X47 X48
## 0.2312147 0.2080533 0.1885418 0.2473494 0.2890315 0.2408984 0.2510278 0.2737105
## X49 X50 X51 X52 X53 X54 X55 X56
## 0.3240872 0.2858554 0.2266128 0.2860678 0.1889608 0.3076745 0.2590483 0.3173507
## X57 X58 X59 X60 X61 X62 X63 X64
## 0.1562264 0.1664796 0.1806980 0.1772290 0.2802008 0.2640610 0.2285110 0.2995103
## X65 X66 X67 X68 X69 X70 X71 X72
## 0.3988171 0.3476939 0.4107263 0.2458319 0.3080117 0.2476690 0.2569082 0.2998278
## X73 X74 X75 X76 X77 X78 X79 X80
## 0.2081875 0.2860152 0.3121896 0.4022291 0.3606270 0.2103608 0.4523347 0.2898885
## X81 X82 X83 X84 X85 X86 X87 X88
## 0.4214855 0.4357242 0.2657990 0.3766702 0.2648457 0.4112567 0.2332311 0.5044254
## X89 X90 X91 X92 X93 X94 X95 X96
## 0.4954990 0.3688872 0.4503789 0.3724656 0.3831799 0.3978120 0.3170924 0.3120408
## X97 X98 X99 X100 X101 X102 X103 X104
## 0.5168357 0.3474566 0.4871479 0.2246716 0.2393519 0.3279044 0.6177146 0.5862825
## X105 X106 X107 X108 X109 X110 X111 X112
## 0.3044554 0.7334970 0.4106253 0.7615054 0.5908725 0.5124012 0.3922545 0.7206554
## X113 X114 X115 X116 X117 X118 X119 X120
## 0.5088558 0.6335478 0.3688571 0.2369283 0.2471229 0.3373832 0.3010028 0.3056114
## X121 X122 X123 X124 X125 X126 X127 X128
## 0.3191850 0.3852741 0.2878719 0.3345766 0.3576146 0.3354800 0.3173557 0.2811183
## X129 X130 X131 X132 X133 X134 X135 X136
## 0.3282735 0.2787811 0.2646994 0.2452368 0.2887026 0.2832019 0.2697465 0.1697001
## X137 X138 X139 X140 X141 X142 X143 X144
## 0.3424444 0.2231549 0.2678094 0.2583324 0.2000580 0.2090354 0.1867517 0.2243998
## X145 X146 X147 X148 X149 X150 X151 X152
## 0.2037349 0.2105233 0.2283949 0.1774096 0.2117993 0.3185335 0.1914717 0.2350078
## X153 X154 X155 X156 X157 X158 X159 X160
## 0.2374945 0.2676401 0.3228837 0.2749466 0.2019710 0.2920097 0.2312663 0.2536593
## X161 X162 X163 X164 X165 X166 X167 X168
## 0.3671672 0.1582853 0.1558215 0.1678399 0.2035401 0.2742840 0.3186933 0.2183891
## X169 X170 X171 X172 X173 X174 X175 X176
## 0.2096937 0.3644996 0.3914623 0.2643055 0.1844223 0.2111139 0.3290036 0.3469771
## X177 X178 X179 X180 X181 X182 X183 X184
## 0.2807223 0.3317799 0.2063050 0.3401475 0.2742733 0.2412250 0.3358014 0.4309520
## X185 X186 X187 X188 X189 X190 X191 X192
## 0.2440685 0.3638181 0.2681254 0.3560519 0.2610982 0.2326433 0.3871452 0.5114821
## X193 X194 X195 X196 X197 X198 X199 X200
## 0.3040894 0.4769509 0.4205419 0.2744101 0.3018878 0.3866420 0.4457668 0.3926145
##
## $positive
## X1 X2 X3 X4 X5 X6 X7 X8
## 0.4820195 0.6003630 0.2465394 0.6724726 0.7123501 0.3952854 0.7214967 0.2092112
## X9 X10 X11 X12 X13 X14 X15 X16
## 0.7035690 0.5912497 0.2983282 0.3861684 0.6498905 0.4875014 0.2268144 0.2469066
## X17 X18 X19 X20 X21 X22 X23 X24
## 0.3098999 0.3229858 0.3393288 0.3871954 0.3213868 0.3304328 0.3113349 0.3325983
## X25 X26 X27 X28 X29 X30 X31 X32
## 0.2217112 0.3494110 0.3076680 0.2590126 0.2827535 0.3486017 0.4062768 0.1949049
## X33 X34 X35 X36 X37 X38 X39 X40
## 0.2346423 0.2695550 0.2477828 0.1955700 0.2303987 0.1997892 0.3255592 0.2118937
## X41 X42 X43 X44 X45 X46 X47 X48
## 0.2312147 0.2080533 0.1885418 0.2473494 0.2890315 0.2408984 0.2510278 0.2737105
## X49 X50 X51 X52 X53 X54 X55 X56
## 0.3240872 0.2858554 0.2266128 0.2860678 0.1889608 0.3076745 0.2590483 0.3173507
## X57 X58 X59 X60 X61 X62 X63 X64
## 0.1562264 0.1664796 0.1806980 0.1772290 0.2802008 0.2640610 0.2285110 0.2995103
## X65 X66 X67 X68 X69 X70 X71 X72
## 0.3988171 0.3476939 0.4107263 0.2458319 0.3080117 0.2476690 0.2569082 0.2998278
## X73 X74 X75 X76 X77 X78 X79 X80
## 0.2081875 0.2860152 0.3121896 0.4022291 0.3606270 0.2103608 0.4523347 0.2898885
## X81 X82 X83 X84 X85 X86 X87 X88
## 0.4214855 0.4357242 0.2657990 0.3766702 0.2648457 0.4112567 0.2332311 0.5044254
## X89 X90 X91 X92 X93 X94 X95 X96
## 0.4954990 0.3688872 0.4503789 0.3724656 0.3831799 0.3978120 0.3170924 0.3120408
## X97 X98 X99 X100 X101 X102 X103 X104
## 0.5168357 0.3474566 0.4871479 0.2246716 0.2393519 0.3279044 0.6177146 0.5862825
## X105 X106 X107 X108 X109 X110 X111 X112
## 0.3044554 0.7334970 0.4106253 0.7615054 0.5908725 0.5124012 0.3922545 0.7206554
## X113 X114 X115 X116 X117 X118 X119 X120
## 0.5088558 0.6335478 0.3688571 0.2369283 0.2471229 0.3373832 0.3010028 0.3056114
## X121 X122 X123 X124 X125 X126 X127 X128
## 0.3191850 0.3852741 0.2878719 0.3345766 0.3576146 0.3354800 0.3173557 0.2811183
## X129 X130 X131 X132 X133 X134 X135 X136
## 0.3282735 0.2787811 0.2646994 0.2452368 0.2887026 0.2832019 0.2697465 0.1697001
## X137 X138 X139 X140 X141 X142 X143 X144
## 0.3424444 0.2231549 0.2678094 0.2583324 0.2000580 0.2090354 0.1867517 0.2243998
## X145 X146 X147 X148 X149 X150 X151 X152
## 0.2037349 0.2105233 0.2283949 0.1774096 0.2117993 0.3185335 0.1914717 0.2350078
## X153 X154 X155 X156 X157 X158 X159 X160
## 0.2374945 0.2676401 0.3228837 0.2749466 0.2019710 0.2920097 0.2312663 0.2536593
## X161 X162 X163 X164 X165 X166 X167 X168
## 0.3671672 0.1582853 0.1558215 0.1678399 0.2035401 0.2742840 0.3186933 0.2183891
## X169 X170 X171 X172 X173 X174 X175 X176
## 0.2096937 0.3644996 0.3914623 0.2643055 0.1844223 0.2111139 0.3290036 0.3469771
## X177 X178 X179 X180 X181 X182 X183 X184
## 0.2807223 0.3317799 0.2063050 0.3401475 0.2742733 0.2412250 0.3358014 0.4309520
## X185 X186 X187 X188 X189 X190 X191 X192
## 0.2440685 0.3638181 0.2681254 0.3560519 0.2610982 0.2326433 0.3871452 0.5114821
## X193 X194 X195 X196 X197 X198 X199 X200
## 0.3040894 0.4769509 0.4205419 0.2744101 0.3018878 0.3866420 0.4457668 0.3926145
##
## $negative
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X97 X98 X99 X100 X101 X102 X103 X104 X105 X106 X107 X108 X109 X110 X111 X112
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X113 X114 X115 X116 X117 X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X129 X130 X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143 X144
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X145 X146 X147 X148 X149 X150 X151 X152 X153 X154 X155 X156 X157 X158 X159 X160
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X161 X162 X163 X164 X165 X166 X167 X168 X169 X170 X171 X172 X173 X174 X175 X176
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X177 X178 X179 X180 X181 X182 X183 X184 X185 X186 X187 X188 X189 X190 X191 X192
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X193 X194 X195 X196 X197 X198 X199 X200
## 0 0 0 0 0 0 0 0
# To summarize this for each network, we just create the average
tmp.df <- data.frame(network.connectivity = tmp.particpation.weighted.thr1$positive,
network = Community$names)
boxplot(tmp.df$network.connectivity~tmp.df$network)
tmp.df.mean <-
tmp.df %>%
group_by(network) %>%
summarise(network.connectivity = mean(network.connectivity))
print(tmp.df.mean)
## # A tibble: 7 x 2
## network network.connectivity
## * <chr> <dbl>
## 1 DAN 0.236
## 2 DMN 0.359
## 3 FPCN 0.285
## 4 Limbic 0.216
## 5 Motor 0.304
## 6 VAN 0.257
## 7 Visual 0.513
We can also compute the connection strength of a given network (i.e. nodes surrounded by a gray circle), to the remaining networks. An example of this would be, how strong the DMN is connected to the rest of the brain. This is again a measure of integration, as it quantifies the “cross-talk” between networks.
Next, lets compute the between network connectivity, using the comm.str command. We need three input arguments, A = the connectivity matrix comm = the community vector (i.e. which network does a node belong to) weighted = a boolean (TRUE/FALSE), stating whether the network is weighted or not
tmp.between.network.connectivity = comm.str(A = current.adj.matrix.cleaned.matrix.thr1,
comm = Community$names,
weighted = T,
measure = c("between"))
print(tmp.between.network.connectivity)
## DAN DMN FPCN Limbic Motor VAN Visual
## 6030.808 5414.221 6063.029 6666.588 5475.198 6166.624 6013.525
The output reflects the strength of connections from a given network to the remaining networks
Next, we want to determine a measure of network integration, called shortest path length. This measure quantifies, how many steps (or “connections”) a given node is away from any other node in the network on average. The measure was originally developed in social psychology, showing that a given person may know any other person in the world via just seven other people (or “connections”). This phenomenon was thus termed “small-world”, since networks usually contain short cuts that cross-link distant nodes (like highways). Accordingly, a network that is segregated in modules, but shows shortcuts between the modules is called a “small-world” network.
So what we will do is to determine the path length of each node, that is how “far” in terms of connectedness a given node is away from all the remaining nodes in the network. Again, the computation is super simple using the pathlengths command of the NetworkToolbox, but I recommend some reading if you’re interested understand the maths behind it (e.g. the paper by Rubinov & Sporns https://pubmed.ncbi.nlm.nih.gov/19819337/). As always, if you want more information on the pathlengths command, run help(“pathlengths”) from the console. The function requires two inputs, A = the adjacency matrix and weighted = a TRUE/FALSE statement indicating whether the matrix is weighted or not.
# compute path lengths
tmp.pathlengths <- pathlengths(current.adj.matrix.cleaned.matrix.thr1,
weighted = TRUE)
The output contains several measures, including a global measure of average shortest path length called ASPL, which is the mean path length of the entire network. A network with on average short path length tends to be relatively integrated with fast communication among its’ nodes. In addition, we get a path length measure for each node, called ASPLi
# average shortest path length of the entire network
tmp.pathlengths$ASPL
## [1] 3.888336
# average shortest path length of each node
tmp.pathlengths$ASPLi
## X1 X2 X3 X4 X5 X6 X7 X8
## 4.137293 4.184463 3.735641 4.309070 4.344773 4.292003 4.631718 3.742403
## X9 X10 X11 X12 X13 X14 X15 X16
## 4.248297 4.098558 3.682416 3.975370 4.207906 3.757072 3.991762 3.947241
## X17 X18 X19 X20 X21 X22 X23 X24
## 4.429823 3.848701 3.955869 3.891383 3.708027 3.519883 3.571361 3.613848
## X25 X26 X27 X28 X29 X30 X31 X32
## 3.553931 3.632140 3.517236 3.633475 3.624753 3.688718 4.172662 3.916317
## X33 X34 X35 X36 X37 X38 X39 X40
## 3.735735 3.447575 3.626929 3.538708 3.588726 3.416286 3.580433 3.636240
## X41 X42 X43 X44 X45 X46 X47 X48
## 3.515939 3.629537 3.524657 3.879971 3.715852 3.902311 4.017731 4.320219
## X49 X50 X51 X52 X53 X54 X55 X56
## 4.091283 3.762210 4.068812 3.720818 3.752102 3.544567 4.025393 3.703734
## X57 X58 X59 X60 X61 X62 X63 X64
## 4.810210 4.416315 4.793397 5.804495 3.935540 3.739433 3.474868 4.045231
## X65 X66 X67 X68 X69 X70 X71 X72
## 4.226915 4.300145 3.968504 4.189374 3.668042 3.685247 4.083494 4.255794
## X73 X74 X75 X76 X77 X78 X79 X80
## 4.131588 4.875339 4.607774 4.042741 3.624909 3.719150 3.721211 3.656196
## X81 X82 X83 X84 X85 X86 X87 X88
## 3.709893 3.700846 4.329797 3.901709 4.285693 4.187040 4.082270 3.945356
## X89 X90 X91 X92 X93 X94 X95 X96
## 3.692889 4.347305 3.649708 3.730928 3.844926 3.557564 3.724421 3.947067
## X97 X98 X99 X100 X101 X102 X103 X104
## 3.719596 3.901345 3.624851 4.857195 4.021027 4.684449 4.041035 4.107374
## X105 X106 X107 X108 X109 X110 X111 X112
## 3.631585 4.579801 4.079121 4.617198 4.053159 3.873140 3.979229 4.300800
## X113 X114 X115 X116 X117 X118 X119 X120
## 3.998248 4.080627 3.506995 3.675975 3.859801 3.980597 3.758583 3.850770
## X121 X122 X123 X124 X125 X126 X127 X128
## 3.704951 3.898133 3.733265 3.567895 3.555840 3.596136 3.607863 3.631067
## X129 X130 X131 X132 X133 X134 X135 X136
## 3.627041 3.678327 3.668946 3.642897 3.733911 3.735393 3.725477 3.558772
## X137 X138 X139 X140 X141 X142 X143 X144
## 3.545756 3.374482 3.466477 3.638288 3.514641 3.425488 3.491995 3.433322
## X145 X146 X147 X148 X149 X150 X151 X152
## 3.523025 3.547419 3.576696 3.653455 3.785955 3.620068 3.343464 3.920666
## X153 X154 X155 X156 X157 X158 X159 X160
## 3.843865 4.044058 3.931359 3.608339 3.634742 3.604867 4.371961 4.286255
## X161 X162 X163 X164 X165 X166 X167 X168
## 4.096086 4.986070 4.177409 4.178426 3.818206 3.703951 3.706176 3.740078
## X169 X170 X171 X172 X173 X174 X175 X176
## 4.407837 3.746634 4.070431 4.224749 3.517316 3.856489 3.898836 3.616137
## X177 X178 X179 X180 X181 X182 X183 X184
## 3.848135 4.000098 4.000410 4.249703 3.734309 3.627386 3.554890 3.673578
## X185 X186 X187 X188 X189 X190 X191 X192
## 5.235660 3.762313 3.939792 3.694639 3.715690 3.790315 3.611376 4.030151
## X193 X194 X195 X196 X197 X198 X199 X200
## 4.600171 3.607241 3.626582 3.779122 3.562657 3.798590 3.559637 3.561593
hist(tmp.pathlengths$ASPLi)
Speaking of small-world networks, we can also directly quantify how much a given network architecture resembles a small world network. Here’s a short definition of small-world networks:
A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but the neighbors of any given node are likely to be neighbors of each other and most nodes can be reached from every other node by a small number of hops or steps.
In other words, small-world networks tend to be an intermediate phenotype between a regular network and a random network (see figure below). In biology, networks tend to orient towards small-world topology (see for instance https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3604768/), which seems to be a particularly cost-efficient topology.
To compute the “small-worldness” of a given network, we can use the smallworldness command of the NetworkToolbox. This requires several inputs. A = an adjacency matrix, iter = the number of iteratively generated random networks with equivalent weight and degree distribution that are generated against which “your” network is compared against. Note that computing this parameter takes some time, and the computation time gets longer the more iterations you select. For help, call help(“smallworldness”) from the console.
# set a fixed seed to get reproducible results when generating random data
set.seed(42)
# compute small-worldness coefficient
tmp.smallworldness <- smallworldness(current.adj.matrix.cleaned.matrix.thr1,
iter = 10,
progBar = FALSE,
method = "HG")
# show small-world coefficient
print(tmp.smallworldness$swm)
## [1] 1.193477
It’s also quite useful to compare different graph theoretical metrics, because many of them are highly inter-related. When we compare the average shortest path length (x-axis) as a measure of integration with another measure of segregation (e.g. clustering, y-axis), we can see that nodes with short path length also tend to show higher clustering. This is because a node with many connections to his neighbors can also reach the remaining network faster via less connections.
plot(tmp.pathlengths$ASPLi, tmp.clustering.weighted.thr1$CCi)
rcorr(tmp.pathlengths$ASPLi, tmp.clustering.weighted.thr1$CCi)
## x y
## x 1.00 -0.24
## y -0.24 1.00
##
## n= 200
##
##
## P
## x y
## x 7e-04
## y 7e-04
You can also have a look at how these metrics distribute across the different networks
# path length
boxplot(tmp.pathlengths$ASPLi~Community$names)
# clustering
boxplot(tmp.clustering.weighted.thr1$CCi~Community$names)
# degree
boxplot(tmp.degree.weighted.thr1~Community$names)
Now that you’ve learned about some basic graph theoretical parameters (there are many more!), it’s time to learn how to compute these measures for larger groups of subjects. Running the code above “by hand” for every individual from the current dataset would possible take ages (…and it would be super boring…). So lets do what I call “applied laziness” and run everything automatically using looped scripts and functions. The basic idea is that we define a set of fixed analysis steps and repeat them iteratively across many individuals. The simplest way to do this is to use for loops (see for instance https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r). And while the computer is getting the work done for you, you can go and do whatever you want…while still getting paid for your job… ;)
So we need to define a basic analysis scheme that we would like to apply to our data. For a given subject, we first need to read in the connectivity matrix, apply some thresholding and preprocessing, and then compute measures of clustering, degree, community strength, path-length and small-worldness. Basically we’ll repeat all the steps above in a single script. To keep it rather simple, we compute the clustering coefficient, degree, and path-length measures for each node and then summarize these measures for the whole brain and for the seven different networks. Note that we will overwrite some of the file names in the code above.
# define your input data
input.data = ADNI.data.bl
# define the number of iterations that need to be performed.
# In our data frame, each row corresponds to one subject, so we want to iterate across rows.
# Thus, the number of iterations corresponds to the number of rows of our input data
n.iter = nrow(input.data)
print(paste0("number of iterations = ", n.iter))
# Next, open the loop, and define that it runs all
# the way from 1 to the number of iterations that you've defined as n.iter
# within the loop, you will use the letter i which will change in each iteration from 1, to 2, to 3 to n.iter
n = 0
for (i in 1:n.iter){
svMisc::progress(i, max.value = n.iter)
# set an internal counter within the loop (can be helpful sometimes!)
n = n +1
# select the i-th observation in the data frame
current.data = input.data[i,]
# import the connectivity matrix
current.adj.matrix = read.csv(current.data$FC.mat.200[1])
# remove first column
current.adj.matrix.cleaned <- current.adj.matrix[,-1]
# clean diagnonal (i.e. remove infinite values with zeros)
diag(current.adj.matrix.cleaned) = 0
# transform to matrix format
current.adj.matrix.cleaned.matrix <-
as.matrix(current.adj.matrix.cleaned)
# remove negative connections and perform density
# thresholding at a density of 1
current.adj.matrix.cleaned.matrix.thr1 <-
adj_mat_density_thresh(
adj_matrix = current.adj.matrix.cleaned.matrix,
retain_negatives = F,
density = 1,
replace_with_NAs = F)
# compute clustering coefficient
current.clustering.weighted.thr1 <-
clustcoeff(current.adj.matrix.cleaned.matrix.thr1,
weighted = T)
# compute degree
current.degree.weighted.thr1 <-
colMeans(current.adj.matrix.cleaned.matrix.thr1,
na.rm = T)
# compute community strength
current.network.connectivity.between =
comm.str(A = current.adj.matrix.cleaned.matrix.thr1,
comm = Community$names,
weighted = T,
measure = c("between"))
# compute participation coefficient
current.network.connectivity.within =
participation(current.adj.matrix.cleaned.matrix.thr1,
comm = Community$names)
# compute path lengths
current.pathlengths <- pathlengths(current.adj.matrix.cleaned.matrix.thr1,
weighted = TRUE)
## compute small-worldness coefficient
#current.smallworldness <- smallworldness(current.adj.matrix.cleaned.matrix.thr1,
# iter = 1,
# progBar = FALSE,
# method = "HG")
# summarize clustering, degree and path length by networks
# to this end we first create a data frame with all the data and the network affiliation vector
tmp.df = data.frame(clustering =
current.clustering.weighted.thr1$CCi,
degree =
current.degree.weighted.thr1,
network.connectivity.within =
current.network.connectivity.within$positive,
pathlengths =
current.pathlengths$ASPLi,
network =
Community$names)
# next we create the mean of each metric for each network in long format
tmp.df.summary <-
tmp.df %>%
group_by(network) %>%
summarise(clustering = mean(clustering),
degree = mean(degree),
pathlength = mean(pathlengths),
network.connectivity.within = mean(network.connectivity.within))
# add subject ID
tmp.df.summary$ID <- current.data$ID
# concatenate network-specific data
if(n == 1){tmp.df.summary.concat = tmp.df.summary}
if(n > 1){tmp.df.summary.concat = rbind(tmp.df.summary.concat, tmp.df.summary)}
# forward small-worldness, average shortest path length and community strength to the main data frame
#current.data$smallworldness <- current.smallworldness$swm
current.data$ASPL <- current.pathlengths$ASPL
current.data$degree.global = mean(current.degree.weighted.thr1)
current.data$clustering.global = current.clustering.weighted.thr1$CC
current.data$network.connectivity.within.global =
mean(current.network.connectivity.within$overall)
# adding in the community strength data is a bit more complicated,
# let me know if you want that explained in further detail
current.network.connectivity.between.reshaped =
data.frame(t(current.network.connectivity.between))
names(current.network.connectivity.between.reshaped) <-
paste0("network.connectivity.between.",
names(current.network.connectivity.between.reshaped))
current.data <-
cbind(current.data, current.network.connectivity.between.reshaped)
# concatenate main data frame
if (n == 1){current.data.concat = current.data}
if (n > 1){current.data.concat = rbind(current.data.concat,current.data)}
}
# reshape network specific path length, clustering and degree
tmp.df.summary.concat2 <-
data.frame(tmp.df.summary.concat)
tmp.df.wide =
reshape(tmp.df.summary.concat2,
idvar = "ID",
timevar = "network",
direction = "wide")
# merge all data together
output.data = merge(current.data.concat, tmp.df.wide, by = "ID")
Since computations may take time, it’s always a good idea to save your output from time to time. Since our for-loop has generated quite some data, it’s time to save an updated spreadsheet, which we can load back in afterwards and continue from there. We can easily write excel spreadsheets using the write_xlsx function from writexl package. All you need is a data frame x that you want to save as well as a full path to an output file
write_xlsx(x = output.data,
path = paste0(dir.root, "/data_sheets/ADNI_fMRI_PET_GraphTheory.xlsx"))
first, import the data frame that we’ve saved before
ADNI.data.bl.graphtheory <- read_xlsx(paste0(dir.root, "/data_sheets/ADNI_fMRI_PET_GraphTheory.xlsx"))
ADNI.data.bl.graphtheory$DX.Ab <- factor(ADNI.data.bl.graphtheory$DX.Ab,
levels = c("CN.Ab.neg",
"CN.Ab.pos",
"MCI.Ab.pos",
"Dementia.Ab.pos"))
Previous studies have suggested that the DMN is disrupted in Alzheimer’s disease patients. Attached, find a figure from a cohort with autosomal dominantly inherited AD patients (i.e. a rare genetic form of the disease, which becomes symptomatic around the age of 40). Here, you can clearly see that the DMN becomes less pronounced in patients with autosomal dominantly inherited AD (right) vs. controls (left). By the way, if you want to learn more about the DMN, I recommend this excellent review by Randy Buckner: https://www.nature.com/articles/s41583-019-0212-7
Similar observations of DMN disruption have been made in patients with sporadic AD. The DMN has been suggested to be crucial for episodic memory performance, hence DMN disruptions have been argued to drive memory decline.
First, lets plot a simple measure of within DMN connectivity across controls and AD patients. We have computed this measure in our for-loop and saved it as network.connectivity.within.DMN in the ADNI.data.bl.graphtheory data frame. We use ggplot again to create a figure
ggplot(data = ADNI.data.bl.graphtheory,
aes(
x = DX.Ab,
y = network.connectivity.within.DMN)) +
geom_boxplot(notch = T, alpha = 0.1) +
geom_beeswarm() +
theme_minimal()
As you can see, there is a decrease in DMN connectivity across the AD spectrum. But is this decrease significant? Lets use an ANCOVA to test this.
tmp.aov <- aov(data = ADNI.data.bl.graphtheory,
network.connectivity.within.DMN ~ DX.Ab + age + sex); summary(tmp.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DX.Ab 3 0.0317 0.010579 3.936 0.00886 **
## age 1 0.0099 0.009858 3.668 0.05640 .
## sex 1 0.0241 0.024105 8.969 0.00297 **
## Residuals 309 0.8305 0.002688
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Indeed, the data shows that the connectivity within the DMN decreases across the AD spectrum.
TukeyHSD(tmp.aov, which = "DX.Ab")
## Warning in replications(paste("~", xx), data = mf): non-factors ignored: age
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = network.connectivity.within.DMN ~ DX.Ab + age + sex, data = ADNI.data.bl.graphtheory)
##
## $DX.Ab
## diff lwr upr p adj
## CN.Ab.pos-CN.Ab.neg 0.001464213 -0.01753724 0.020465663 0.9972029
## MCI.Ab.pos-CN.Ab.neg -0.008160736 -0.03220387 0.015882401 0.8169144
## Dementia.Ab.pos-CN.Ab.neg -0.041177880 -0.07338402 -0.008971739 0.0058690
## MCI.Ab.pos-CN.Ab.pos -0.009624949 -0.03705353 0.017803628 0.8014350
## Dementia.Ab.pos-CN.Ab.pos -0.042642093 -0.07744856 -0.007835630 0.0092136
## Dementia.Ab.pos-MCI.Ab.pos -0.033017144 -0.07081274 0.004778455 0.1106865
A post-hoc Tukey test shows that the patients with AD dementia mostly drive this effect
Next, lets have a look whether the connectivity of the DMN to the remainig brain also changes across the AD spectrum. We have also computed this measure in our for-loop and saved it as network.connectivity.between.DMN in the ADNI.data.bl.graphtheory data frame. So same procedure as above….
ggplot(data = ADNI.data.bl.graphtheory,
aes(
x = DX.Ab,
y = network.connectivity.between.DMN)) +
geom_boxplot(notch = T, alpha = 0.1) +
geom_beeswarm() +
theme_minimal()
As you can see, there is a rather an increase in DMN connectivity to the remaining brain across the AD spectrum. This suggests some sort of de-differentiation of network activity (i.e. reduced within but increased between network connectivity) But is this decrease significant?
tmp.aov <- aov(data = ADNI.data.bl.graphtheory,
network.connectivity.between.DMN ~ DX.Ab + age + sex + PTEDUCAT); summary(tmp.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DX.Ab 3 8.180e+07 27267025 1.658 0.176116
## age 1 4.469e+07 44686739 2.717 0.100309
## sex 1 1.931e+08 193068744 11.738 0.000695 ***
## PTEDUCAT 1 3.394e+07 33936798 2.063 0.151895
## Residuals 308 5.066e+09 16447579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nope….So to sum up: What we find is a decrease in DMN connectivity across the AD spectrum. But is this decrease in connectivity really related to memory? Lets have a look at a scatterplot first
ggplot(data = ADNI.data.bl.graphtheory,
aes(
x = network.connectivity.within.DMN,
y = ADNI_MEM)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
Next, lets check the correlation between DMN connectivity and memory performance
# compute correlation
rcorr(ADNI.data.bl.graphtheory$network.connectivity.within.DMN,
ADNI.data.bl.graphtheory$ADNI_MEM)
## x y
## x 1.00 0.12
## y 0.12 1.00
##
## n
## x y
## x 315 314
## y 314 314
##
## P
## x y
## x 0.0359
## y 0.0359
Looks like there is a slight correlation between decreasing within DMN connectivity and decreasing memory performance. But keep in mind that “healthy controls” (i.e. CN.Ab.neg) are also included in this plot and the correlation analysis. So in order to specifically investigate the role of the DMN in AD patients, we need to create a subset of the data, that is restricted to AD patients. To this end, we subset the data fram ADNI.data.bl.graphtheory to subjects whose diagnosis is unequal to (i.e. !=) “CN.Ab.neg”.
# we store our subset in a dedicated data frame
ADNI.data.bl.graphtheory.AD.only <- subset(ADNI.data.bl.graphtheory, DX.Ab!="CN.Ab.neg")
Now we’re left with AD patients only as you can see from the table below.
table(ADNI.data.bl.graphtheory.AD.only$DX.Ab)
##
## CN.Ab.neg CN.Ab.pos MCI.Ab.pos Dementia.Ab.pos
## 0 67 37 19
So lets have a look again at the association between network connectivity and memory performance, using our subset of AD patients only:
ggplot(data = ADNI.data.bl.graphtheory.AD.only,
aes(
x = network.connectivity.within.DMN,
y = ADNI_MEM)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
# compute correlation
rcorr(ADNI.data.bl.graphtheory.AD.only$network.connectivity.within.DMN,
ADNI.data.bl.graphtheory.AD.only$ADNI_MEM)
## x y
## x 1.00 0.23
## y 0.23 1.00
##
## n= 123
##
##
## P
## x y
## x 0.0117
## y 0.0117
Looks a bit stronger! So together, we found that DMN connectivity decreases across the AD spectrum, and that a decrease in DMN connectivity is associated with lower memory performance.
Next, you can try to run some additional analyses. For example, you can try to see connectivity of networks other than the DMN change throughout the disease course and whether they’re related to cognitive function. Also, try to see how graph theoretical parameters are inter-related and how they relate to the AD groups.